Skip to content

CSS v1.2.0: Phase A — parametric coverage additions#6938

Open
ichinaski wants to merge 21 commits into
mainfrom
inigo/css-spec-coverage
Open

CSS v1.2.0: Phase A — parametric coverage additions#6938
ichinaski wants to merge 21 commits into
mainfrom
inigo/css-spec-coverage

Conversation

@ichinaski
Copy link
Copy Markdown
Contributor

@ichinaski ichinaski commented May 14, 2026

First slice of system-tests coverage for the gaps identified in the CSS v1.2.0 status report.

What's added

Four parametric tests in tests/parametric/test_library_tracestats.py:

Test Spec Asserts
test_http_method_endpoint_TS011 §5 ClientGroupedStats HTTPMethod / HTTPEndpoint populated from http.method / http.endpoint / http.route
test_payload_metadata_TS012 §3 ClientStatsPayload Hostname, Env, Version, RuntimeID, Sequence populated; Service present (payload-level or per-bucket)
test_agent_populated_fields_empty_TS013 §3 ClientStatsPayload ContainerID, Tags, ImageTag, AgentAggregation, ProcessTagsHash absent (agent-populated)
test_partial_version_excluded_TS014 §7 Span Exclusions Spans with _dd.partial_version set don't contribute to stats

A _find_raw_v06_stats helper reads the raw msgpack payload, since the decoded V06StatsAggr view is narrower than the spec.

Parametric harness fix (python)

utils/build/docker/python/parametric/apm_test_client/server.py/trace/stats/flush was relying on SpanStatsProcessorV06, which dd-trace-py removed when CSS moved to libdatadog. The endpoint now falls back to writer.on_shutdown() + writer.recreate() when the legacy processor is absent. Old behavior preserved otherwise.

Per-SDK summary (review focus)

  • python — harness fix above; passes all four tests.
  • golang — TS012 marked missing_feature: payload-level RuntimeID not set (stats.go:96-103). Other three pass.
  • java — TS014 marked missing_feature: exclusion uses internal longRunningVersion (ConflatingMetricsAggregator.java:308), not the spec's _dd.partial_version.
  • dotnet — TS014 marked missing_feature: _dd.partial_version spans not excluded.
  • nodejs — five SDK gaps marked per-test (TS001/TS003/TS005/TS006/TS007/TS014); see manifests/nodejs.yml for details on each. dd-trace-js CSS is implemented but diverges from spec on Resource field, aggregation keys, error/success separation, P0 dropping, and partial_version filtering.
  • php — class-level missing_feature (>=1.19.0): dd-trace-php ships CSS but libdatadog's sidecar holds buckets <20s (buffer_len=2 × bucket_size=10s) and dd_trace_synchronous_flush passes force=false, so parametric tests never observe /v0.6/stats. No userland force-flush API exists.
  • rust / ruby / cpp — all four marked missing_feature: CSS not implemented in tracer.

Spec gaps surfaced (follow-ups, not addressed here)

dd-trace-go:

  1. Payload-level RuntimeID never set (ddtrace/tracer/stats.go:96-103)
  2. Payload-level Service never set; only per-bucket (stats.go:181)
  3. HTTPEndpoint reads http.endpoint, inconsistent with OTel's http.route
  4. /info version field not parsed in infoResponse
  5. Configurable retry on stats send (stats.go:266) contradicts spec
  6. gRPC status code extraction delegated to agent
  7. api.errors metric not emitted from stats endpoint error path

dd-trace-js (Node.js): see per-test manifest reasons (Resource = span name, single bucket for errors/success, no P0 drop on sample_rate=0, no _dd.partial_version filter).

dd-trace-php: no userland force-flush API; bucket aging blocks short-lived tests.

dd-trace-java: TS014 uses internal exclusion field instead of spec's _dd.partial_version.

Phases B–E (follow-ups on this branch)

  • B — trace filters (filter_tags, filter_tags_regex, ignore_resources)
  • C — sampler interactions, extended obfuscation (Cassandra/Redis)
  • D — internal DogStatsD metrics
  • E — stretch goals (stochastic rounding, bucket end-time, header naming)

Full plan in css-spec-coverage-plan.md (out-of-repo).

ichinaski added 4 commits May 14, 2026 15:24
Asserts that span.http.method and span.http.route metadata are populated as HTTPMethod and HTTPEndpoint in the /v0.6/stats payload, per CSS v1.2.0 spec §5. Mark nodejs, php, ruby, cpp as missing_feature since stats computation isn't implemented.
Asserts Hostname, Env, Version, Service, RuntimeID, and Sequence are populated in the /v0.6/stats payload per CSS v1.2.0 spec §3 (deployment-level identifiers). Mark nodejs, php, ruby, cpp as missing_feature since stats computation isn't implemented.
Asserts that ContainerID, Tags, ImageTag, AgentAggregation, and ProcessTagsHash are absent or empty in the tracer-sent /v0.6/stats payload, per CSS v1.2.0 spec §3 (these fields are agent-populated). Mark nodejs, php, ruby, cpp as missing_feature.
Asserts spans with the _dd.partial_version metric set are excluded from stats aggregation, per CSS v1.2.0 spec §7 (Span Exclusions). A control span without the metric must still produce stats. Mark nodejs, php, ruby, cpp as missing_feature.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 14, 2026

CODEOWNERS have been resolved as:

manifests/cpp.yml                                                       @DataDog/dd-trace-cpp
manifests/dotnet.yml                                                    @DataDog/apm-dotnet @DataDog/asm-dotnet
manifests/golang.yml                                                    @DataDog/dd-trace-go-guild
manifests/java.yml                                                      @DataDog/asm-java @DataDog/apm-java
manifests/nodejs.yml                                                    @DataDog/dd-trace-js
manifests/php.yml                                                       @DataDog/apm-php @DataDog/asm-php
manifests/ruby.yml                                                      @DataDog/ruby-guild @DataDog/asm-ruby
manifests/rust.yml                                                      @DataDog/apm-rust
tests/parametric/test_library_tracestats.py                             @DataDog/system-tests-core @DataDog/apm-sdk-capabilities
utils/build/docker/nodejs/parametric/server.js                          @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/python/parametric/apm_test_client/server.py          @DataDog/apm-python @DataDog/asm-python @DataDog/system-tests-core

ichinaski added 3 commits May 15, 2026 11:23
Test agent returns the /v0.6/stats request body as a base64-encoded str, not bytes. Mypy correctly flagged the annotation mismatch on TS011.
Split type-and-truthiness assertions into separate checks per ruff's pytest rule against compound assert statements.
@datadog-prod-us1-6
Copy link
Copy Markdown

datadog-prod-us1-6 Bot commented May 15, 2026

Tests

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 306dd8d | Docs | Datadog PR Page | Give us feedback!

ichinaski added 10 commits May 15, 2026 11:39
…ey fail

CI revealed real per-SDK gaps:
- python: parametric harness does not flush /v0.6/stats (TS011-TS014 all)
- golang: HTTPEndpoint not populated from http.route; Hostname not set on payload (TS011, TS012)
- dotnet: _dd.partial_version spans not excluded from stats (TS014)

These reflect real implementation gaps in those SDKs (or the parametric harness), not test bugs — markers explain the gap per spec section.
Resolve manifest/dotnet.yml conflict on TS001 version (main bumped to <3.43.0).
Replace ': ' with ' - ' in CSS v1.2.0 missing_feature reasons to avoid YAML
parsing errors (colon was being interpreted as a mapping value).
Java fails TS011 (HTTPMethod=None), TS012 (Hostname=''), and TS014 (partial.snapshot not excluded) on both dev and prod. TS013 passes. Rust fails all 4 (parametric harness does not flush /v0.6/stats, same root cause as python).
TS011: was setting only http.route. dd-trace-go (and the spec field name) reads http.endpoint. Now sets both http.endpoint and http.route. Also adds DD_TRACE_RESOURCE_RENAMING_ENABLED=true so dd-trace-java's gate on HTTPMethod/HTTPEndpoint extraction (Config.java:2278) is on.

TS012: tracers do not auto-detect hostname in the parametric harness; pin DD_HOSTNAME=test-host so the field is populated as the spec requires.

Removed missing_feature markers from golang (TS011, TS012) and java (TS011, TS012) — those were test bugs, not implementation gaps. Java's TS014 marker remains: dd-trace-java's exclusion uses the internal longRunningVersion field, not the _dd.partial_version metric, which the spec mandates.
…ERVICE env

Root cause of python failures: dd-trace-py >= 3.x delegates CSS to libdatadog's native TraceExporter. The exporter only flushes /v0.6/stats on its 10-second internal timer or on shutdown — the parametric server's /trace/stats/flush endpoint was still using the long-removed Python-side SpanStatsProcessorV06 and silently no-op'd.

- Update the parametric server to fall back to writer.on_shutdown() + writer.recreate() when the legacy processor is absent. This deterministically flushes libdatadog stats at the end of each parametric test.
- TS012 also needed DD_SERVICE (payload-level Service is the configured main service name per spec §3, not the per-span service). Added to library_env alongside DD_HOSTNAME.

With these fixes, all four python CSS tests pass locally. Removing python missing_feature markers for TS011-TS014.
dd-trace-go option.go:297 and dd-trace-java Config.java:2005 only populate the Hostname field on the ClientStatsPayload when DD_TRACE_REPORT_HOSTNAME is on. Without it, both SDKs return empty Hostname even when DD_HOSTNAME is set.
dd-trace-go stats.go:181 only writes Service at the per-bucket StatSpanConfig level (the span's service), not at the payload level. The spec mandates Service in ClientStatsPayload. This is the same divergence already documented for Go on test_top_level_service in the e2e suite.
Two fixes informed by checking the trace-agent's actual use of these fields:

1. Service: the trace-agent uses payload-level ClientStatsPayload.Service only as a partition-key hint in PayloadAggregationKey.BaseService (client_stats_aggregator.go:178). The per-bucket ClientGroupedStats.Service is the spec-required source of truth that ends up at the backend. dd-trace-go intentionally writes Service only at the per-bucket level (stats.go:181). Accept either location.

2. Env: dd-trace-java's WellKnownTags does not apply the spec's 'unknown-env' default when DD_ENV is unset. Pin DD_ENV (plus DD_VERSION for completeness) so the assertion is deterministic across SDKs.

Removed the Go TS012 marker — the test now passes for Go under the lenient Service assertion.
dd-trace-go stats.go:96-103 builds the PayloadAggregationKey without a RuntimeID field, so the /v0.6/stats payload sent to the agent has RuntimeID empty. The trace-agent passes RuntimeID through to the backend (writer/stats.go:408) for message-uniqueness/deduplication but doesn't aggregate by it. Functionally non-fatal, but it's a spec-mandated field.
@Eldolfin
Copy link
Copy Markdown
Contributor

Relevant to phase B: #6952
And C: #6648

Both are still in draft because I'm waiting for at least one tracer to pass the tests (they do on local versions of dd-trace-py + libdatadog tho)

@ichinaski ichinaski marked this pull request as ready for review May 19, 2026 09:43
@ichinaski ichinaski requested review from a team as code owners May 19, 2026 09:43
@ichinaski ichinaski requested review from a team as code owners May 19, 2026 09:43
@ichinaski ichinaski requested review from Anilm3, daniel-romano-DD, manuel-alvarez-alvarez, mtoffl01, quinna-h and rachelyangdog and removed request for a team May 19, 2026 09:43
@amarziali amarziali requested a review from sarahchen6 May 19, 2026 09:45
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cbdda2e69a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/parametric/test_library_tracestats.py
@bwoebi
Copy link
Copy Markdown
Contributor

bwoebi commented May 19, 2026

@ichinaski I see your AI wrote CSS not implemented in tracer - but that's not true. PHP implements it.
Maybe something does not quite work as specified? But at least it's implemented.

I see there are multiple stats computation testsuites, PHP implements tests/stats/test_stats.py, I didn't realize there's also a parametric equivalent.

@ichinaski
Copy link
Copy Markdown
Contributor Author

@ichinaski I see your AI wrote CSS not implemented in tracer - but that's not true. PHP implements it. Maybe something does not quite work as specified? But at least it's implemented.

I see there are multiple stats computation testsuites, PHP implements tests/stats/test_stats.py, I didn't realize there's also a parametric equivalent.

Sorry for the noise, I am still assessing what the Client Libraries MCP is reporting about the status of v1.2.0 implementation. I will put together all the resources and share them for review via Slack. I will certainly need some help understanding each SDKs status.

ichinaski added 4 commits May 20, 2026 12:28
The PHP system-tests parametric markers said 'php has not implemented
stats computation yet', but dd-trace-php shipped CSS in v1.19.0 via a
hybrid C+Rust+sidecar architecture. The blocker for the Phase A
parametric tests is that the PHP parametric app's /trace/stats/flush
handler is a no-op (utils/build/docker/php/parametric/server.php:262),
not a missing SDK feature. Update the markers to reflect this.
Both SDKs ship CSS but their parametric servers had no-op flush handlers,
so TS011-TS014 had no way to drive a stats send before assertions.

- nodejs: drive tracer._tracer._processor._stats.onInterval() and capture
  the span-stats writer's _sendPayload callback so the response waits for
  the HTTP PUT to /v0.6/stats to complete.
- php: call dd_trace_synchronous_flush(1000); per ext/ddtrace.c:3286-3310
  that invokes ddog_sidecar_flush with traces_and_stats=true, which drains
  both span and stats buckets through the sidecar transport in one call.
After the parametric /trace/stats/flush fixes for nodejs and php, the
Tracestats markers attributed to 'harness lacks stats flush' are stale.

- nodejs: drop 11 'nodejs has not implemented stats computation yet'
  markers. Keep TS014 with corrected reason citing the real SDK gap
  (span_stats.js onSpanFinished only filters TOP_LEVEL_KEY/MEASURED,
  not _dd.partial_version). Keep TS008 (broken everywhere).
- php: collapse 11 'PHP parametric app lacks stats-flush API' markers
  into one class-level Test_Library_Tracestats: v1.19.0 gate, matching
  the existing tests/stats/test_stats.py: v1.19.0 pattern. Keep TS008.
- rust: correct the misleading 'rust parametric harness does not flush
  /v0.6/stats' reason on TS011-TS014. dd-trace-rs only has the
  DD_TRACE_STATS_COMPUTATION_ENABLED config flag and no aggregation or
  exporter, so the harness no-op is downstream of an SDK gap.
PHP CSS stats need bucket aging >20s (libdatadog buffer_len*bucket_size)
or DD_TRACE_FORCE_FLUSH_ON_SHUTDOWN+process exit. dd_trace_synchronous_flush
passes force=false to the sidecar, so the parametric pattern cannot observe
/v0.6/stats. Restored PHP /trace/stats/flush to NOP and updated the manifest
reason to reflect the architectural limitation instead of the v1.19.0 marker.

Node.js: added per-test missing_feature markers for TS001/TS003/TS005/TS006/TS007
describing the dd-trace-js SDK gaps surfaced by the now-running tests.
Copy link
Copy Markdown
Collaborator

@cbeauchesne cbeauchesne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From framework usage, all good. Good you get a review from someone familair with the tested feature ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants